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(57) Abstract 

The present invention relates to serine protease NS3 of hepatitis C virus, and in particular to the observation that the NS3 serine 
protease domain, in its native conformation, binds a Tii^^ ion and that bivalent metallic ions are necessary to the structural integrity of 
the protein and to the activity of the enzyme. The present invention further relates to recombinant polypeptides which comprise sequences 
of the NS3 protease and are characterised by a tail of at least three lysines at their C-terminal ends, to increase its solubility. A further 
subject of the present invention is a new process which allows the expression of said polypeptides, as mcialloproieins. with the proteolytic 
activity of the HCV NS3 protease, in a soluble form and in a quantity sufficient to allow research to identify inhibitors and to determine 
the three-dimensional structure of the NS3 protease. Figure 4 shows the effects of the zinc ion on the production of the HCV NS3 protease 
as a soluble protein in E. Coli in a minimum culture meidium. 
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SOLUBLE POLYPEPTIDES WITH ACTIVITY OF THE NS3 SERINE 
PROTEASE OF HEPATITIS ' C VIRUS, AND PROCESS FOR THEIR 
PREPARATION AND ISOLATION 

DESCRIPTION 

The hepatitis C virus (HCV) is the main etiologic 
agent of non-A, non-B hepatitis (NANB) . It is estimated 
that HCV causes at least 90% of post-transfusional NANB 
viral hepatitis and 50% of sporadic NANB hepatitis. 
Although great progress has been made in the selection of 
blood donors and in the immunological characterisation of 
blood used for transfusions, there is still a high level 
of acute HCV infection among those receiving blood 
transfusions, resulting in one million or more infections 
every year throughout the world. Approximately 50% of HCV 
infected individuals develop cirrhosis of the liver 
within a period that can range from 5 to 40 years, and 
recent clinical studies suggest that there is . a 
correlation between chronic HCV infection and the 
development of hepatocellular carcinoma. 

HCV is an enveloped virus containing an RNA positive 
genome of approximately 9.4 kb. This virus is a member of 
the Flavlvlrldae family, the other members of which are 
the pestiviruses and f laviviruses . 

The RNA genome of HCV has recently been sequenced. 
Comparison of sequences from the HCV genomes isolated in 
various parts of the world has shown that these sequences 
can be extremely heterogeneous. Most of the HCV genome is 
occupied by an open reading frame (ORF) that can vary 
between 9030 and 9099 nucleotides. This ORF codes for a 
single viral polyprotein, the length of which can 
obviously vary from 3010 to 3033 amino acids. During the 
virus infection cycle, the polyprotein is proteolytically 
processed into the individual gene products necessary for 
replication of the virus. 

The genes coding for HCV structural protein are 
located at the 5' end of the ORF, whereas the region 
coding for the non-structural proteins occupies the rest 
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of the ORF. The structural proteins consist of: C (core, 
21 kDa) , El (envelope, gp37) and E2 (NSl, gp61) . C is a 
non-glycosilkte protein of 21 kDa, which probably forms 
the viral nucleocapsid . The protein El is a glycoprotein 
5 of approximately 37 kDa and is believed to be a 
structural protein of the outer viral envelope. E2. 
another membrane glycoprotein of 61 kDa, is probably a 
second structural protein of the outer envelope of the 
virus . 

10 The non- structural region starts with NS2 (p24) . a 

hydrophobic protein of 24 kDa whose function is not 
known. NS3, a protein of 68 kDa which follows NS2 in the 
polyprotein. has two functional domains: a serine 
protease domain in the first 180 amino- terminal amino 
15 acids and an RNA- dependent ATPase domain in the carboxy- 
terminal part. The gene region corresponding to NS4 codes 
for NS4A (p6) , a membrane protein of 54 amino acids, and 
NS4B (p26) . The gene corresponding to NS5 codes for two 
proteins. NS5A (p56) and NS5B (p65) , of 56 and 65 kDa, 
20 respectively. Recently it has been shown that the NS5B 
region has an RNA dependent RNA- polymerase activity (1) . 

Various molecular biological studies indicate that 
the signal peptidase, a protease associated with the 
endoplasmic reticulum of the host cell, is responsible 
25 for proteolytic processing in the non- structural region, 
that is to say the. sites C/El. E1/E2 and E2/NS2 (2). A 
first protease activity of HCV is responsible for the 
cleavage between NS2 and NS3 . This activity is contained 
in a region comprising both a part of NS2 and the part of 
30 NS3 containing the serine protease domain, but does not 
use the same catalytic mechanism (3) . On the contrary, 
the serine protease contained in the ISO amino acids at 
the amino-terminal of NS3 is responsible for cleavage at 
the junctions between NS3 and NS4A, between NS4A and 
35 NS4B, between NS4B and NS5A, and between NS5A and NS5B 
(4-8) . In particular it has been found that the cleavage 
produced by this serine protease leaves a residue of 
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cysteine or threonine on the amino- terminal side 
(position PI) and a residue of alanine or serine on the 
carboxy- terminal side (position PI') of the substrate (6, 
9). Recently it has been shovn that NS4A binds the N- 
5 terminal end of NS3 with its central hydrophobic portion, 
thereby increasing the proteolytic activity of NS3 in all 
the cleavage sites on the polyprotein (10-12) . 

Inhibition of the protease activity would therefore 
stop the proteolytic processing of the non-structural 
!0 portion of the KCV polyprotein and, as a consequence,, 
would prevent virus replication in infected cells . This 
sequence of events has been verified in a flavivirus, 
homologous of the hepatitis C virus, which infects cells 
in culture. 

15 In this case it has been possible to show that 

genetic manipulation, producing- a protease that„ is no 
longer capable of exerting its catalytic activity, 
abolishes the ability of the virus to replicate (13) . 
Furthermore it has been widely demonstrated, both in 

20 vitro and in clinical studies, that compounds capable of 
interfering with the activity of the HIV protease are 
capable of inhibiting the replication of this virus (14). 

Finally there is evidence of the fact that the NS5 
region of HCV, which as we have mentioned above has an 

25 RNA dependent RNA-polymerase activity, does not display 
this function except after processing by the • NS3 
protease . 

Therefore a substance capable of interfering with 
the proteolytic activity associated with the NS3 protein, 

30 could be a new therapeutic agent. From this point of view 
detailed knowledge of the three-dimensional structure of 
the protease takes on a great deal of importance, as it 
would allow both a greater understanding of the 
biological phenomena in which it is involved, and the 

35 analysis, study and design of inhibitor molecules capable 
of interfering with the protease activity, thus paving 
the way for the development of pharmaceutical 
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compositions suitable for treatment of hepatitis C. 
Nevertheless, determination of the structure both using 
NMR methods and X-r^y crystallography,- requires large . 
amounts of soluble protein, and at the present time it is 
5 not possible to meet this request. In fact, although the 
simplest and most economical manner of obtaining large 
amounts of the desired polypeptide is expression of the 
corresponding gene in bacteria, and although there is a 
widespread availability of numerous eucaryotic promoters 
JO and methods for maximising the expression of heterologous 
genes in E. Coli, nevertheless an efficient production of 
the polypeptide in question, although necessary, might 
not be sufficient. Many recombinant proteins do not fold 
the polypeptidic chain correctly when they are expressed 
15 in E. Coli. The result is the synthesis of polypeptides 
which are either degraded in the host cell, or are 
accumulated in an insoluble form in the so called 
inclusion bodies (15) . Furthermore. in the case of 
extremely hydrophobic proteins, proteins of viral origin 
20 or proteins that are toxic for the bacterial cell (as is 
the case for certain proteases of viral origin) there are 
insurmountable difficulties in producing them in a 
native, soluble form. 

In the case of the NS3 serine protease of the 
25 hepatitis C virus, due to the conditions in which the 
protein is normally, produced, it has not been possible to 
date to obtain in E. coli a native type, soluble protease 
in amounts sufficient to enable the study of the 
structural nature of this protein, which requires 
30 solutions containing a high millimolar concentration of 
the protein. 

It has now unexpectedly been found that these 
important limitations can be overcome by using the method 
according to the present invention. As will be seen from 
35 the following, this method is based on the unexpected 
discovery that the NS3 serine protease domain, in its 
native conformation, binds a Zn* ion. 
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•5. 

Because, as mentioned above, the structure of the 
HCV NS3 protease is not yet known, a structural model of 
the protein was prepared, -to be used as a guide during 
experiments. However, the similarity of the NS3 protease 
5 to other serine proteases of known structure is extremely 
low (less than 15%) , which does not allow good alignment 
between sequences and as a result does not allow 
construction of a three-dimensional model based solely on 
homology. For this reason, the available serine protease 

!0 structures were used to build a multiple alignment of the 
structurally conserved regions and to draw up in this way 
a profile with which the sequence of the NS3 protease 
could subsequently be aligned. In this way it was 
possible to build an approximate three-dimensional model 

15 of the HCV NS3 protease (9, 16). 

Recently, three new viruses responsible for human 
hepatitis have been discovered (17) . These new viruses, 
known as GBV-A, GBV-B and GBV-C, show a polyprotein 
organisation in common with that of HCV (18, 19) . From 

20 alignment of the region corresponding to NS3 in these 
three new viruses with that of various HCV serotypes, 
several preserved amino acids were identified. These 
residues comprise: the amino acids in the active site, 
some glycines and prolines (probably involved in 

25 stabilising the structure of the protein) and three 
cysteines and one histidine (figure 1) . In the model 
suggested by us for the NS3 protease these last four 
residues are found in a region of the molecule opposite 
the active site, in a close spatial relationship,, and 

30 their relative position is such that it forms a binding 
site for a divalent metallic ion, such as for example the 
ion Zn'* (figure 2) . 

This observation was subsequently confirmed 
experimentally. In fact, as will be illustrated in 

35 greater detail in the examples, the HCV NS3 protease 
actually has a metal content equivalent to one mole of 
zinc to each mole of protein, and as is the case in other 
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proteins the zinc is necessary to enable the protein to 
take on its native structure and become catalytically 

active (20, 21) . 

The fact that the NS3 protease has a binding site 
for a metal ion and that this binding site is so well 
preserved, even in viruses that are not phylogenetically 
close, opens the way to the study of antiviral 
therapeutic agents whose target site is this very region 
of the protein. In fact, in the case of another viral 
protein that binds Zn^* ions, that is to say the HIV 
virus nucleocapsid. it has been possible to identify 
compounds that interfere selectively with the bond 
between the protein and the Zn'* ions (22. 23) and it has 
also been seen that these compounds interfere with the 
viral infection of cells grown in culture medium. 

An object of the present invention is therefore to 
provide a method for high-yield expression, in a native 
form, that is to say as a protein containing a bivalent 
metallic ion, and in a highly soluble form of the HCV NS3 
protease using heterologous expression systems, such as 
E. coli cells transformed using suitable genetic 
constructs and cultivated in a medium enriched with salts 
containing divalent metal ions. 

A further object of the present invention is to 
provide a general method allowing preparation and 
isolation in a native, pure and highly soluble form, of 
large amounts of polypeptides containing Zn^*, Co'* or 
Cd'*, with the protease activity of HCV NS3 . 

Furthermore, an additional object of the present 
invention is to provide a method that allows preparation 
and isolation in a native, pure and highly soluble form 
of large amounts of polypeptides with the protease 
activity of HCV NS3 , which are at the same time marlced 
using stable heavy isotopes such as 13c or 15^, as 
required for experiments to determine the three- 
dimensional structure of the protein using NMR. 
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Finally, the present invention provides new genetic 
constructs for the expression, in E- coli cells, of 
modified polypeptides with the protease activity of HCV 
NS3 , having a high yield of the native and soluble form 
5 of the HCV NS3 protease. 

These and other objects are achieved using one or 
more of the embodiments of the present invention 
described below. 

-In an embodiment of the invention a procedure is 

10 provided for obtaining production of the NS3 serine 
protease domain in its native form, that is to say 
containing a bivalent metal ion, which is necessary for 
the structural - integrity of the protein. The innovation 
in the procedure consists in the addition to the culture 

15 medium in which the transformed bacterial cells are grown 
of compounds containing metals such as Zn, Co, Cd, Mn, 
Cu, Ni, Ag, Fe, Cr, Hg, Au, Pt, V. These compounds 
provide the culture medium with the ions required by the 
protein to take on its native structure. In this way the 

20 protein is found in its native, soluble form in the 
cytoplasm of the bacterial cells, instead of being held 
in the included bodies, from which it can only be 
obtained by applying difficult resolubilisation 
procedures . 

25 In another embodiment of the invention, a procedure 

is provided that makes it possible to replace the zinc 
ion in the protease, which is spectroscopically silent, 
with other ions (for example Co^* or Cd^"^) , which are 
spectroscopically active, so as to permit the study of 

30 possible inhibitors capable of co-ordinating the metal 
contained in the protein and therefore of disturbing the 
bond between the protein and the metal. 

In another embodiment of the invention, the addition 
of bivalent metal ions to a minimum culture medium, 

35 containing glucose and ammonium salts enriched with 13C 
or 15N as the sole sources of carbon and nitrogen, 
respectively, makes it possible to obtain large amounts 
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of soluble protein marked wich stable heavy isotopes such 
as 13C or 15N. This type of isotope enrichment is 
necessary to determine the structure using NMR. 
techniques . 

5 In a further embodiment of the present invention 

polypeptide sequences are provided that contain the NS3 
serine protease domain of hepatitis C virus, suitably 
modified. These polypeptides are characterised in that 
they have at their C- terminal end a sequence of extremely 

10 hydrophilic amino acids, such as for example a series of 
lysines, which are not present in the original sequence. 
By using this other new method there is a substantial 
improvement in terms of solubility and integrity of the 
protein produced. These modified protease molecules are 

15 also to be considered as a subject of the present 
invention . 

Subjects of the present invention are therefore: 

a) Isolated and purified polypeptides containing the 
HCV NS3 serine protease domain, characterised in that 

20 they have at their C-terminal end a tail of at least 

three lysines. 

b) A process for the preparation of polypeptides 
containing the HCV NS3 serine protease domain in a 
soluble form, of use for enzymological experiments. 

25 determination of the three-dimensional structure of the 
enzyme both by means of NMR and using X-ray 
crystallography, comprising the following operations .- 

- transformation of a prokaryotic host cell with an 
expression vector containing a DNA sequence coding for a 

30 polypeptide with the proteolytic activity of the HCV NS3 
protease; 

- growth of the prokaryotic host cell on a specia^ 
culture medium containing Zn* or alternatively salts of 
transition metals such as Co. Cd. Mn, Cu, Ni. Ag, Fe. Cr. 

35 Hg, AU, Pt, V; 

- expression of the DNA sequence required to produce 

the chosen polypeptide ; 
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- purification of the polypeptide without having to 
resort to resolubilisation protocols, and without the 
need for renaturation of the. protein from included 
bodies . 

5 c) A process for the renaturation in vitro of the 

above polypeptides, characterised in that it coniprises 
the following operations: 

- transformation of a prokaryotic host cell with an 
expression vector containing a DNA sequence coding for a 

10 polypeptide with the proteolytic activity of HCV NS3 
protease ; 

- expression of the DNA sequence recjuired to produce 
the chosen polypeptide; 

- purification of the denaturated and renaturated 
15 polypeptide of the protein using buffers containing 2n^* 

or alternatively salts of transition metals such as Co, 
Cd, Mn, Cu, Ni, Ag, Fe, Cr, Hg, Au, Pt , V. 

d) Expression vectors for the production of the 
polypeptides represented by the sequences SEQ ID NO : 1 to 

20 SEQ ID NO: 4 with the proteolytic activity of HCV NS3 , 
comprising: a polynucleotide coding for one of said 
polypeptides; regulation, transcription and translation 
sequences, operating in said host cell, operationally 
bonded to said polynucleotide; and, optionally, a 

25 selectable marker. 

e) A prokaryotic cell transformed with an expression 
vector containing a DNA sequence coding for polypeptides 
with the proteolytic activity of the HCV NS3 protease, so 
as to allow said host cell to express the specific 

30 polypeptide which is coded in the chosen sequence . 

Figure 1 shows the alignment between the HCV NS3 
serine protease sequence and the viruses GBV-A, GBV-B and 
GBV-C/HGV (Hcv, Hga, Hgb, Hgc) , with the poliovirus (Pol) 
2A cysteine protease. Amino acids conserved in the HCV 

35 proteases and in the viruses GBV-A, GBV-B and GBV-C/HGV 
are shaded. The catalytic residues are underlined and 
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the residues that bind zinc are indicated using the 
symbol _ . 

Figure 2 shows a -diagrammatic model of the NS3 
serine protease domain. In particular it shows the 
5 position within the structure of the amino acids involved 
in binding zinc (dark grey) and the catalytic triad 
(light grey) . 

Figure 3 shows the effect of the zinc ion on HCV NS3 
serine protease activity. 

10 Figure 4 shows the effects of the zinc ion on the 

production of HCV NS3 protease as a soluble protein in E, 
coli on a minimum culture medium. Column 2 refers to the 
results of the experiment carried out on the cells 
without inducing protease production (-IPTG). Columns 3, 

!5 4 and 5 indicate that in the absence of ZnClj and 
following the induction of protease production (^-IPTG) 
the protein remains locked in the insoluble portion 
(indicated by the abbreviation PT) . On the contrary, in 
the presence of ZnClj the protease is found entirely in 

20 the soluble portion (indicated by the abbreviation SN) . 

Figure 5 shows the electronic spectrums of the HCV 
NS3 protease. Figure 5a shows the visible and near-UV 
spectrum of the Co^^-protease . Figure 5b shows the UV 
absorption spectrums of the 2n'* -protease and of the Cd^*- 

25 protease . 

nF.PQSTTS 

Strains of E. coli DHl/p bacteria transformed with 
the plasmids pT7-7{Pro BK-as K4) , pT7-7 (Pro) -asK4 ) , pT7- 
7 {Pro H-asK4) and pT7-7 (Pro J8-asK4) and coding for the 

30 amino acid sequences SEQ ID NO:l, SEQ ID NO:2, SEQ ID 
NO: 3 and SEQ ID NO: 4, respectively, were deposited on 
August 8, 1996 with The National Collections of 
Industrial and Marine Bacteria Ltd (NCIMB) , Aberdeen, 
Scotland. U.K., under access numbers NCIMB 40821, NCIMB 

35 40822, NCIMB 40823 and NCIMB 40824, respectively. 

Up to this point a general description has been 
given of the present invention. With the aid of the 
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following examples a more detailed description of 
specific embodiments of the invention will now be given, 
with the aim of clarifying the objects, characteristics,, 
advantages and methods of application thereof. 
5 EXAMPLE 1 

EXPRESSION AND PURIFICATION OF POLYPEPTIDES WITH THE 
PROTEASE ACTIVITY OF HCV NS3 , IN THEIR NATIVE SOLUBLE 
FORM 

The plasmids pT7-7(Pro BK-asK4) , pT7-7(Pro H-asK4), 
10 pT7-7{Pro J-asK4) and pT7-7{Pro J8~asK4) were constructed 
to allow expression in E. coli of polypeptides 
characterised in that they have a sequence chosen from 
the ones in the group from SEQ ID N0:1 to SEQ ID NO:4. 
The polypeptides contain the NS3 protease domain of 
15 various HCV isolates (BK, H, J and J8, respectively) with 
the addition of a *'tail" of four lysines at the C- 
terminal end. 

pT7-7 (Pro BK-asK4) contains the sequence for HCV-BK 
(EMBL data bank access number: M58335) between the 
20 nucleotides 3411 and 3950, cloned in the vector pT7-7. 

pT7-7 (Pro H-asK4) contains the sequence for HCV-H 
(EMBL data bank access number: M674 63) between the 
nucleotides 3420 and 3959, cloned in the vector pT7-7. 

pT7-7 (Pro J-asK4) contains the sequence for HCV- J 
25 (EMBL data bank access number: D90208) between the 
nucleotides 3408 and 3 947, cloned in the vector pT7-7. 

pT7-7 (Pro J8-asK4) contains the sequence for HCV-J8 
(EMBL data bank access number: D109B 8/DG1221 ) between 
the nucleotides 3432 and 3971, cloned in the vector pT7- 
30 7 . 

The expression vector pT7-7 is a derivative of 
pBR322 which contains, in addition to the gene for P- 
lactamase and the replication origin of ColEl, the 
promoter and the ribosome binding site of the T7 
35 bacteriophage 010 gene (24) . 

The fragments coding for the HCV NS3 protease were 
cloned downstream of the T7 bacteriophage 010 promoter. 
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in reading frame with the first ATG condon of the gene 10 
protein of phage T7 using methods known to the art - 

The cDNA fragment containing the sequence . HCV-BK 
between nucleotides 3411 and 3950 was amplified by 
5 Polymerase Chain Reaction (PGR), using the 
oligonucleotides PR0T(BK-K4)S (SEQ ID NO: 5) and PROTCBK- 
K4)AS {SEQ ID NO: 6) as primers. The cDNA fragment so 
obtained was digested with the restriction enzyme Ndel , 
and cloned in pT7-7, which was first linearised with the 
10 restriction enzymes Wdel and S/nal . 

The cDNA fragment containing the sequence HCV-H 
between nucleotides 3420 and 3959 was amplified by PGR, 
using the oligonucleotides PROT(H-K4)S (SEQ ID NO: 7) and 
PR0T(H-K4)AS (SEQ ID NO:B) as primers. The cDNA fragment 
15 so obtained was digested with the restriction enzymes 
Wdel and EcoRl , and cloned in pT7-7, which was first 
linearised with the same restriction enzymes. 

The cDNA fragment containing the sequence HCV-J 
between nucleotides 3408 and 3947 was amplified by PGR, 
20 using the oligonucleotides PROT(J-K4)S (SEQ ID NO: 9) and 
PROT(J-K4)AS (SEQ ID NO: 10) as primers. The cDNA 

fragment so obtained was digested with the restriction 
enzymes Ndel and EcoRl, and cloned in pT7-7, which was 
first linearised with the same restriction enzymes. 
25 The cDNA fragment containing the sequence HCV-J8 

between nucleotides 3432 and 3971 was amplified by PGR, 
using the oligonucleotides PROT ( J8 -'K4 ) S (SEQ ID NO:ll) 
and PROT(J8-K4) AS (SEQ ID NO:12) as primers. The cDNA 
fragment so obtained was digested with the restriction 
30 enzymes Ndel and EcoRl , and cloned in pT7-7, which was 
first linearised with the same restriction enzymes. 

The plasmids pT7-7(Pro BK-asK4), pT7-7{?ro K-asK4) , 
pT7-7(Pro J-asK4) and pT7-7(Pro J8-asK4) containing NS3 
sequences also contain the gene for p- lactamase, which 
35 can be used as a selection marker for E. coli cells 
transformed with these plasmids. 
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The fragments were cloned downstream of the T7 
bacteriophage promoter, in reading frame with the first 
ATG codon of the gene 10 protein of phage T7 using 
methods known to the art. The plasmids pT7-7 (Pro BK- 

5 asK4), pT7-7(Pro H-asK4), pT7-7(Pro J-asK4) and pT7-7(Pro 
J8-asK4) containing NS3 sequences also contain the gene 
for P- lactamase, which can be used as a selection marker 
for E. coli cells transformed with these plasmids. 

The plasmids are then transformed in the E. coli 

10 strain BL21 (DE3), normally used for high levels of 
expression of genes cloned in expression vectors 
containing the T7 promoter. In this strain the T7 
polymerase gene is carried into the bacteriophage X DE3 , 
which is integrated into the chromosome of BL21 cells 

15 (25) . Expression of the gene is induced by incubating 
the cultures at an A600 nm of 0.7-0.9 with 0.4 mM of 
isopropyl-l-thio-P-D-galactopyranoside (IPTG) for 3 hours 
at 20**C in LB culture medium additioned with ZnCl2 at a 
concentration that can vary from 5 0 |aM to 1 mM. After 

20 the three hours have passed the cells are harvested and 
washed in a saline phosphate buffer solution (20 mM 
sodium phosphate pH 7.5, 140 mM NaCl) , after which they 
are re-suspended in 25 mM sodium phosphate at pH 7.5, 10% 
glycerol, 500 mM NaCl, 10 mM DTT, 0.5% CHAPS (10 ml per 1 

25 litre of culture medium) . The cells are then lysated by 
passing twice through a '^French pressure cell" and the 
homogenate obtained in this way is centrif ugated at 
lOCOOOxg for 1 hour, while the nucleic acids are removed 
by precipitation with 0.5% polyethylenimine . The 

30 supernatants are loaded onto a HiLoad 16/10 SP Sepharose 
High Performance column (Pharmacia) , and balanced with 5 0 
mM of sodium phosphate an pH 7.5, 5% glycerol, 3 mM DTT, 
0.1% C:HAPS (buffer A) . The column had been washed 
repeatedly with buffer A and the protease was eluted by 

35 applying a gradient of from 0 to 0.6 M NaCl . The 
fractions containing the protease were then collected and 
concentrated using a chamber for ultrafiltration under 
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magnetic stirring, equipped with a YM-10 membrane 
(Amicon) . The sample was then loaded onto an HR 2 6/60 
HiLoad Superdex 75 column (Pharmacia), balanced wi,th 
buffer A, operating at a flow rate of 1 ml/min. 
5 The fractions containing NS3 were collected and 

further purified on an HR 5/5 Mono S column (Phar-macia) , 
balanced with buffer B and operating at a flow rate of 1 
ml/min. The protease was eluted from the column in pure 
form applying a linear gradient of 0-0.6 NaCl in buffer 
10 A. 

After this passage the protein was preserved in 
stocks at concentrations of 50-150 ^iM at a temperature of 
-80 ®C after freezing in liquid nitrogen. The 
concentration of the protein was estimated by 

15 determination of absorbancy at 280 nm using a coefficient 
of extinction deriving from the sequence data or from 
quantitative amino acid analysis. Both methods come to 
the same results, with an error factor of 10%. The 
purity of the enzyme was ascertained on SDS 

20 polyacrylamide gel and by HPLC using an inverse phase 
Vydac C4 column (4.6x250 mm, 5 mm, 300 A). The eluents 
used were H2O/0.1% TFA (A) and acetonitryl/0 . 1% TFA (B)- 
A linear gradient of from 3% to 95% B over 60 minutes was 
used- Analysis of the N- terminal end was carried out 

25 using Edman degradation on a gaseous phase sequencer 
(Applied Biosystem model 470A) and the analysis by mass 
spectroscopy revealed that more than 96% of the purified 
protein has the N- terminal sequence PITAYSSQ. The 
remaining 3% has the sequence MAPITAYSSQ as foreseen from 

30 the data on the nucleotide sequence. 

In order to measure the enzymatic activity of the 
pux'if led protein , a synthetic peptide of 13 amino acids 
was used as a substrate. This peptide was derived from 
the cleavage sequence of the NS4A-NS4B junction 

35 (DEEMECSSHLPYK) . A peptide with 14 amino acids 

corresponding to the central hydrophobic region of the 
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protein NS4A (from position 21 to position 34) (Pep4A21- 
34: GSWIVGRIILSGR) was used as a protease cofactor. 

The peptides were synthesised by solid phase 
synthesis based on Fmoc chemistry. After washing and 
5 deprotection, the **raw" peptides were purified by HPLC to 
98% purity. The identity of the peptides was determined 
by mass spectrometry- The peptide solutions stored were 
prepared in DMSO and preserved at -80*'C, furthermore the 
concentrations were determined by quantitative amino acid 

10 analysis carried out on samples hydrolysed with HCl . 

The cleavage tests were carried out using 300 nM - 
1.6 of enzyme in 30 1 of 50 mM Tris pH 7.5, 50% 

glycerol, 2% CHAPS, 3 0 mM DTT and appropriate amounts of 
substrate and/or peptide-NS4A at 22°C. The reaction was 

15 stopped by addition of 70^1 of H20 containing 0.1% TFA. 
Cleavage of the peptide substrate was determined by HPLC 
using a Merck-Hitachi chromatograph . After chis, 90\xl of 
each sample were injected into an inverse phase 
Lichrospher CIS cartridge column (4x125 mm, S^im, Merck) 

20 and the fragments were separated using an acetonitryl 
gradient of 3-100% at 2%/min. Identification of the peak 
was achieved following both the absorbancy at 220 nm and 
the fluorescence of the tyrosine {A.ex= 260 nm, Xem= 305 
nm) . 

25 Tables 1 and 2 give the data for solubility and 

yield relating to the NS3 protease corresponding to 
various HCV virus isolated. Table 1 gives the data for 
production of the various forms of protease both with and 
without the addition of four lysines at the C- terminal 

30 end, and both with and without the addition of 2nCl2 in 
the culture medium. The data are expressed as the 
percentage of protein recovered in the soluble fraction 
of the cell extracts and the protein found in the 
included bodies. Table 2 gives the yields and solubility 

35 of the various forms of protease, purified from E. coli 
cells grown in the presence of ZnCl2. As can be seen 
from the results given, the modified proteases (BK-ASK4, 
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J-ASK4, H-ASK4) are between 10 and 20 times more soluble 
and, when expressed in a culture medium containing an 
excess of ZnCl2, they give a yield up to 10 times greater 
than the respective proteases without the lysine tail. 

5 

construct Culture medium Soluble portion Included bodies 


Pro J 


LB 

S\ 

95% 

Pro i7-asK4 


LB 

20% 

80% 

Pro J 

LB 

+ ZnCl2 

99V 

<1% 

Pro J-asK4 

LB 

+ ZnCl2 

99% 

<1% 

Pro H 


LB 

<2% 

>98% 

Pro H-asK4 


LB 

3-4% 

>95% 

Pro H 

LB 

+ 2nCl2 

5% 

95% 

Pro H-asK4 

LB 

♦ ZnCl2 

50% 

50% 


TP>BT.iT3 2 


20 


Construct 

Yield (mg/lt medium) 

Solubility (mg/ml) 

Pro BK 

1-2 

1-2 

Pro BK-asK4 

10-15 

>40 

Pro H 

0.1-0.2 

1-2 

Pro H-asK4 

1-2 

>40 

Pro J 

1-2 

0.5-1 

Pro J-asK4 

15-20 

>10 


KYAMPT.K 2 

30 mTTgPMTNATTON OF TH B MRTAT. mNTKNT OF POT.YPF.PTIDFnS WITH 
T ffp; PPOTPT.YTTr ArTTVTTV OP HPV Wfi-^ PROTEASE 

The poiyptiiptides pui"ified according to the procedure 
described in examples 1, 3 and 5 were further dialysed 
against buffers containing a chelating agent, in order to 
35 remove any metal ions bound to the protein, and their 
metal content was determined by atomic absorption 
spectrometry using a Perkin-Elmer Instrument 
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spectrotneter . The glass equipment used for analysis of 
the metal content was washed using 3 0% nitric acid and 
rinsed completely with deionised water. The protease (at 
a concentration of 4 mg/ml) was dialysed for a period of 

5 at least 16 hours against a buffer containing 50 mM 
Tris/Hcl pH 7.5, 3 mM DTT, 10% glycerol, 0.1% CHAPS. A 
Chelex-100 resin (2.5 g/1) was held in suspension in the 
dialysis buffer to prevent contamination by casual metal 
ions. The protein was then hydrolysed with nitric acid 

10 and then used to determine the metal content • The 
standardised Zn^"*", Co^"*" and Cd^* solutions were purchased 
from Merck. 

The metal content was found to be 1 g-atom per 1 
mole enzyme (see table 3 - n.d.= not determined), with 
15 the exception of of the apoprotein, which has a 
negligible metal content. 


TABLE 3 

20 

Protein Zn (g-atoms/mole) Co (g-atoms/mole) Cd (g-atoms/Tnole) 


Zn2+-NS3 
Apo*NS3 
25 Co2+-NS3 
Cd2+-NS3 


n.d.: not determined 
30 EXftMPLS 3 

PPOrKnTTPR FOR TTff! PRNATtJRATTON OF THE N53 PROTEASE IN 

THE PRESENCE OF ZIKC 

To ascertain whether or not zinc is required for HCV 
NS3 serine protease activity, its proteolytic activity 
35 was first measured on a synthetic substrate peptide. 
This measurement was carried out in the presence of 
increasing concentrations of EDTA or of 1,10- 


1. 09 
0.02 
0.19 
0.09 


n.d. 
n.d. 
0. 90 
n.d. 


n.d, 
n.d. 
n.d. 
1.15 
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phenanthroline . It was found that these two corapo\inds do 
not inhibit proteolysis by NS3 at concentrations lower 
than 1 mM. Above ' these concentrations both EDTA and 
1 , 10-phenanthroline only show a modest level of 
5 inhibition of NS3 activity. However a similar inhibition 
behaviour has been obtained in control experiments using 
structurally similar elements to 1 , 10 -phenanthroline , 
which is not capable of chelating zinc ions, and the 
activity was not re-obtained in the presence of an excess 

10 of Zn^* ions. These results suggest that either zinc is 
not required for enzymatic activity, or that it is so 
strongly bonded to the protein that it cannot be removed 
by treatment with chelating agents. It was therefore 
decided to proceed with preparation of a protein 

15 containing no zinc (apoprotein) and to measure its 
biochemical activity in the absence and in the presence 
of this metal. Bonded zinc cannot be removed by dialysis 
against chelators with a pH exceeding 1, whereas on the 
other hand prolonged dialysis of the enzyme at a pH of 

20 less than 5 and in the presence of 10 mM EDTA causes a 
loss of zinc accompanied by irreversible precipitation of 
the sample. The above observations suggest that the zinc 
is strongly bound and that it is essential for the 
structural integrity of the protein. In order to 

25 facilitate the release of zinc the apoprotein was 
obtained by applying the following procedure : 1.7 mg of 
NS3 protease were denaturated by addition of TFA to a 
final concentration of 1%. The denaturated protein was 
then purified on a Resouce RPC 3 ml column using an 

30 acetonitryl gradient of from 0% to 85% in the presence of 
0.1% TFA. The flow rate of the column was equivalent to 
2 ml/min and the volume of the gx'adient was ml. The 
zinc content of the apoprotein was found to be 
negligible. The enzymatic activity of the apoprotein was 

35 then tested in the presence and in the absence of zinc. 
The apoprotein was diluted to a final concentration of 60 
nM in the activity buffer containing the concentrations 
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of ZnCl, shown in Che graph and 10 mM DTT to prevent 
oxidation of the thiole groups. After an incubatxon 
period of r hour at 22°C the reaction was started ■ by 
adding the substrate eptide at a concentration of 40 mM. 
5 The reaction was then made to proceed for another hour 
before taking the measurements. As shown in figure 3. 
reconstitution of the enzymatic activity depends on the 
concentration of zinc ions in the buffer. Maximum 
reactivation was observed at a ZnCl2 concentration of 25 
10 jiM. At this concentration the enzymatic activity is 
found to be approximately 50% when compared to the 
protease containing zinc (diluted in the same buffer at 
the same final concentration). This experiment gives 
unequivocal proof that zinc is necessary in order for the 
enzyme to be structurally complete and active, and it 
also provides a method for reconstitution of NS3 serine 
protease activity starting from the apoprotein. 

pvaMPT.F. 4 

pp^^ ^^o Trnp TTTF. p p nnnrTTON OF ^^r^r NS^ PRnTFA^E TN A FORM 
20 T-"^-^ ""^^^^ ^ ""^ nPTPRMTNATTON OF THFl THREE- 

PT,^p;^ |cTnM^T. qTRrTC T 'TTT^F THKi^F.OF HSTNr. NMR TKrHNTQUES 

The discovery that HCV NS3 protease contains a 
structural zinc atom has been used to increase the 
production of soluble protein in bacterial cells (E. 
coli) and therefore to produce a protein in a form that 
can be used for experiments aimed at determining the 
structure by means of NMR. 

In effect, determination of structure by means of 
NMR involves metabolic marking with 15^ and 13c, to t>e 
carried out on a minimum culture medium, for example 
modified M9 culture medium (NHJiSO, lg/1, K-phosphate 100 
mM. MgS04 0.5 mM, CaClj 0.5 niM, biotin 5hM. thia;r>ine 7}iM. 
ampicillin 5jig/ml, glucose 4 g/1, FeSO, . TH^O 13 >xM) . 
Induction in this culture medium, which does not include 
zinc salts in its composition, inevitably results in the 
production of insoluble protein, whereas the addition of 
50^M of ZnClz results in the production of a completely 


25 
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soluble protease. In this way it is possible to produce 
a marked protein using (ISnhj^so^ as a source of nitrogen 
and ^^C-glucose as a source of carbon. 

Following this new procedure, a protein has been 
5 obtained that remains in a soluble form in the cytoplasm 
and is not captured by the inclusion bodies, as was the 
case using the old procedures. In this way the 

resolubilisation procedures become unnecessary, which 
results in considerable advantages, as these procedures 

10 have an extremely variable yield, require extremely 
controlled conditions and also frequently cause 
irreversible alterations in the protein. Figure 4 shows 
how the protease (at approximately 21 kDa - indicated in 
the figure by an arrow) is produced as an insoluble 

15 aggregate (PT) when the bacterial cells are grown in 
minimum culture medium without zinc (columns 3, 4 and 5), 
On the contrary, if ZnClj is added to the culture medium 
at a concentration of 50 mM the protein is found in the 
soluble fraction (SN) (columns 6, 7 and 8) and disappears 

20 from the insoluble fraction (PT) . 
EXAMPLE 7> 

REPLACEMENT OF THE Zn^ ROTTND to nc;^ wtth .qPKrTRO.groPTr 

The Zn^^ binding site of the HCV NS3 protease and 
25 zinc can be studied by replacing the zinc with metals 
that make spectroscopic studies possible. The close 
binding of the structural zinc to the enzyme makes it 
difficult to remove the metal and replace it in vitro. 
As a result, the Zn^" was replaced by Co^* and Cd^* by 
30 incorporation in vivo. The bacterial cells (E. coli) 
were transformed with an appropriate expression vector 
and grown in minimum culture medium containing IOC mM 
potassium phosphate at pH 7.0, 0.5 mM MgSO^ , 0.5 mM 
CaCl2, 13 \xM FeSO^, 7 thiamine, 6 ^iM biotin. Glucose 
35 (4 g/1) and {NH4)2S04 (1 g/1) were used as sources of 
carbon and nitrogen, respectively. To reduce the amount 
of Zn"" in the culture medium, the phosphate buffer was 
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made to pass through a Chelex-100 column. To obtain 
production of Co^^ or Cd^"-NS3, 50 mM of CoCl2 and CdCl2 
were added, respectively, 20 minutes before addition of 
IPTG. Purification of the Co^*^ and Cd^*-proteases was 
5 obtained using the procedure described in example 1, 
except for the fact that all the buffers used were 
treated with Chelex-100 resin (2.5 g/1) and the DTT was 
eliminated. 

The addition of CoCl2 or CdCl2 to the culture medium 

10 still results in production of the soluble enzyme, which 
indicates that the Co^* and Cd^* ions can replace zinc in 
the binding site for metal and protease. 

The protease containing Co^* and Cd^* was subjected 
to electronic absorption spectroscopic analysis . The 

15 protease containing Co^* shows a typical absorption 
spectrum in the visible region (figure 6a) , which 
indicates a binding site with a tetrahedral geometry 
(26) . The two main bands at 64 0 nm and at 6 85 nm and the 
minimums at 585 nm and at 74 0 nm indicate d-d 

20 transitions. The energy in these transitions and the 
molar extinction coefficients are characteristic of 
complexes with a distorted tetrahedral co-ordination 
geometry (27) . The d-d transition energy is consistent, 
with a mixed sulphur-nitrogen co-ordination bond. 

25 Furthermore, the centroide in the band corresponding to 
the d-d transition indicates a Co^* complex with a S3N 
bond (26). A typical charge transfer band S -> Co^* was 
observed at around 365 nm (figure 6a) , implying that the 
metal ion is co-ordinated by thiolates. 

30 In accordance with these data, the UV absorbancy 

spectrum of the Cd"*-protease (figure 6b) shows an 
increase in absorbancy at around 250 nm, which in all 
probability is due to a charge transfer band S -> Cd^* 
(28) . In conclusion, spectroscopic analysis of the Co^* 

35 and Cd^*- proteases is completely consistent with the 
three-dimensional model proposed by us. In face, in the 
model the binding site for the metal is made up of three 
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thiole groups of three cysteines and of a nitrogen atom 
from the side chain of a hystidine. Each of the residues 
that according to the model form the binding site for the 
metal has been changed to alanine and, as expected, none 
5 of the mutants obtained is capable of being expressed in 
a soluble form in E. coli. 
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SEQUENCE LISTING 

GENERAL INFORMATION 
(i) APPLICANT: 

ISTITUTO DI RICERCHE DI BIOLOGIA MOLECOLARE P,ANGELETTI S.p.A. 
5 (ii) TITLE OF INVENTION: "SOLXJBLE POLYPEPTIDES WITH ACTIVITY OF 
THE NS3 SERINE PROTEASE OF HEPATITIS C VIRUS, AND PROCESS FOR 
THEIR PREPARATION AND ISOLATION" 

(iii) NUMBER OF SEQUENCES: 12 

(iv) MAILING ADDRESS: 

10 (A) ADDRESSEE: Societa' Italiana Brevetti 

(B) STREET: Piazza di Pietra, 3 9 

(C) CITY: Rome 

(D) COUNTRY: Italy 

(E) POST CODE: 1-00186 
15 (v) COMPUTER- READABLE FORM: 

(A) TYPE OF SUPPORT: Floppy disk 3.5'" 1.44 MBYTES 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS Rev. 5.0 

(D) SOFTWARE: Microsoft Word 6.0 
20 (viii) AGENT INFORMATION 

(A) NAME: DI CERBO Mario (Dr.) 

(B) REFERENCE: RM/X88878/PC-DC 
(ix) TELECOMMUNICATIONS INFORMATION 

(A) TELEPHONE: 06/6785941 
25 (B) TELEFAX: 06/6794692 

(C) TELEX: 612287 ROPAT 


(1) INFORMATION ON SEQUENCE SEQ ID NO : 1 : 
30 (i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 187 amino acids 

(B) TYPE: aininc acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
35 (ix) FEATURE 

(A) NAME: Pro BK-asK4 
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(D) OTHER INFORMATION: sequence for the NS3 
protease of HCV- isolated BK. 

(xi) SEQUENCE DESCRIPTION SEQ ID NO: 1: 
Met Ala Pro lie Thr Ala Tyr Ser Gin Gin Thr Arg Gly Leu Leu Gly 
5 1 5 10 15 

Cys lie lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val Glu Gly 

20 25 30 

Glu Val Gin Val Val Ser Thr Ala Thr Gin Ser Phe Leu Ala Thr Cys 
35 40 45 

10 Val Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Ser Lys Thr 
50 55 60 

Leu Ala Gly Pro Lys Gly Pro lie Thr Gin Met Tyr Thr Asn Val Asp 
65 70 75 80 

Gin Asp Leu Val Gly Trp Gin Ala Pro Pro Gly Ala Arg Ser Leu Thr 
J5 85 90 95 

Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala 

100 105 110 

Asp Val lie Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu Leu 
115 120 125 

20 Ser Pro Arg Pro Val Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu 
130 135 140 

Leu Cys Pro Ser Gly His Ala Val Gly lie Phe Arg Ala Ala Val Cys 
145 150 155 150 

Thr Arg Gly Val Ala Lys Ala Val Asp Phe Val Pro Val Glu Ser Met 
25 165 170 175 

Glu Thr Thr Met Arg Ala Ser Lys Lys Lys Lys 
180 185 
(2) INFORMATION ON SEQUENCE SEQ ID NO: 2: 
(i) SEQUENCE CHARACTERISTICS 
30 (A) LENGTH: 187 amino acids 

(B) TYPE: amino acid 

(C) STRAITOEDOTSS; single 

(D) TOPOLOGY: linear 
(ix) FEATURE 

35 (A) NAME: Pro H-asK4 

(D) OTHER INFORMATION: sequence for the NS3 
protease of HCV- isolated H. 
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(xi) SEQUENCE DESCRIPTION SEQ ID NO: 2: 
Met Ala Pro lie Thr Ala Tyr Ala Gin Gin Thr Arg Gly Leu Leu Gly 
1 S .10 15 

Cys lie lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val Glu Gly 
5 20 25 30 

Glu Val Gin lie Val Ser Thr Ala Thr Gin Thr Phe Leu Ala Thr Cys 

35 40 45 

lie Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Thr Arg Thr 
50 55 60 

10 lie Ala Ser Pro Lys Gly Pro Val lie Gin Met Tyr Thr Asn Val Asp 
65 70 75 80 

Gin Asp Leu Val Gly Trp Pro Ala Pro Gin Gly Ser Arg Ser Leu Thr 

85 90 95 

Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala 
15 100 105 110 

Asp Val lie Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu Leu 

115 120 125 

Ser Pro Arg Pro lie Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu 
130 135 140 

20 Leu Cys Pro Ala Gly His Ala Val Gly Leu Phe Arg Ala Ala Val Cys 
145 150 155 160 

Thr Arg Gly Val Ala Lys Ala Val Asp Phe lie Pro Val Glu Asn Leu 

165 170 175 

Glu Thr Thr Met Arg Ala Ser Lys Lys Lys Lys 
25 180 185 

(3) INFORMATION ON SEQUENCE SEQ ID NO: 3: 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 187 amino acids 

(B) TYPE: amino acid 

30 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
( ix) FEATURE 

(A) NAME: ProJ-asK4 

(D) OTHER INFORMATION: sequence for the NS3 
35 protease of HCV- isolated J. 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3: 
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Met Ala Pro lie Thr Ala Tyr Ser Gin Gin Thr Arg Gly Leu Leu Gly 
1 5 10 15 

Cys lie lie Thr Ser Leu Thr- Gly Arg Asp Lys Asn Gln Val Asp Gly 
20 25 30 

5 Glu Val Gin Val Leu Ser Thr Ala Thr Gin Ser Phe Leu Ala Thr Cys 
35 40 45 

Val Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Ser Lys Thr 

50 55 60 

Leu Ala Gly Pro Lys Gly Pro lie Thr Gin Met Tyr Thr Asn Val Asp 
10 65 70 75 80 

Gin Asp Leu Val Gly Trp Pro Ala Pro Pro Gly Ala Arg Ser Met Thr 

85 90 95 

Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala 
100 105 110 

15 Asp Val Val Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu Leu 
115 120 125 

Ser Pro Arg Pro He Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu 

130 135 140 

Leu Cys Pro Ser Gly His Val Val Gly lie Phe Arg Ala Ala Val Cys 
20 1 45 1 50 1 55 1 60 

Thr Arg Gly Val Ala Lys Ala Val Asp Phe lie Pro Val Glu Ser Met 

165 170 175 

Glu Thr Thr Met Arg Ala Ser Lys Lys Lys Lys 
180 185 
25 (4) INFORMATION ON SEQUENCE SEQ ID NO 4 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 186 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
30 (D) TOPOLOGY: linear 

(ix) FEATURE 

(A) NAT-IE: Pro J8-asK4 

(D) OTHER INFORMATION: sequence for the NS3 
protease of HCV-isolated J8 . 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Ala Pro lie Thr Ala Tyr Thr Gin Gin Thr Arg Gly Leu Leu Gly Ala 
15 10 15 
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Ile Val Val Ser Leu Thr Gly Arg Asp Lys Asn Glu Gin Ala Gly Gin 

20 - 25 . 30 

val Gin Val Leu Ser Ser Val Thr Gin Thr Phe Leu Gly Thr Ser lie 
5 35 40 45 

Ser Gly Val Leu Trp Thr Val Tyr His Gly Ala Gly Asn Lys Thr Leu 

50 55 60 

Ala Gly Pro Lys Gly Pro Val Thr Gin Met Tyr Thr Ser Ala Glu Gly 
65 70 75 80 

10 Asp Leu Val Gly Trp Pro Ser Pro Pro Gly Thr Lys Ser Leu Asp Pro 

85 90 95 

Cys Thr Cys Gly Ala Val Asp Leu Tyr Leu Val Thr Arg Asn Ala Asp 

100 105 llO 

Val lie Pro Val Arg Arg Lys Asp Asp Arg Arg Gly Ala Leu Leu Ser 
15 115 120 125 

Pro Arg Pro Leu Ser Thr Leu Lys Gly Ser Ser Gly Gly Pro Val Leu 

130 135 140 

Cys Ser Arg Gly His Ala Val Gly Leu Phe Arg Ala Ala Val Cys Ala 
145 150 155 160 

20 Arg Gly Val Ala Lys Ser He Asp Phe He Pro Val Glu Ser Leu Asp 

165 170 175 

Val Ala Thr Arg Ala Ser Lys Lys Lys Lys 
180 185 
(5) INFORMATION ON SEQUENCE SEQ ID NO : 5: 
25 (i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 nucleotides 

(B) -TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 <ii) MOLECULE TYPE: Synthetic DNA 

(iv) ANTISENSE: No 

(vii) IMMEDIATE SOURCE: oligonucleotidG synthesiser 
(ix) FEATURE 

(A) NAME: PR0T(BK-K4)S 
35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GCATACATAT GGCGCCCATC ACGGCC 26 
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(6) INFORMATION ON SEQUENCE SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 33 nucleotides 

(B) TYPE: nucleic acid 
> (C) STRANDEDNESS : single 

(D) ASPECT: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(iv) ANTISENSE: Yes 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
(ix) FURTHER CHARACTERISTICS 

(A) NAME: PROT (BK-K4 ) AS 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CTACTTCTTC TTCTTGCTAG CCCGCATAGT AGT 33 

15 

(7) INFORMATION ON SEQUENCE SEQ ID NO : 7: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 nucleotides 

(B) TYPE: nucleic acid 
20 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(iv) ANTISENSE: No 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
25 (ix) FEATURE 

(A) NAME: PROT(H-K4)S 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GAGATACATA TGGCGCCTAT CACGGC 26 


(8) INFORMATION ON SEQUENCE SEQ ID NO : 8: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 42 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(iv) ANTISENSE: Yes 
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(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
(ix) FEATURE 

(A) NAME: PROT(H-K4)AS 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
5 TTTGAATTCC TACTT C TTCT TCTTGCTAGC TCTCATGGTT GT 42 

(9) INFORMATION ON SEQUENCE SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 27 nucleotides 

(B) TYPE: nucleic acid 

10 (C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(iv) T^NTISENSE: No 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
15 (ix) FEATURE 

(A) NAME: PROT(J-K4)S 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
TTTCATATGG CGCCTATCAC GGCCTAT 27 

(10) INFORMATION ON SEQUENCE SEQ ID NO : 10: 
20 (i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

25 (ii) MOLECULE TYPE: Synthetic DNA 

(iv) ANTI SENSE: Yes 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
(ix) FEATURE 

(A) NAME: PROT(J-K4)AS 
30 (xi/ SEQUENCE DESCRIPTION SEQ ID NO: 10: 

TTTGAATTCC TACTTCTTCT TCTTGCTAGC CCGCATGGTA GT 42 

(11) INFORMATION ON SEQUENCE SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 24 nucleotides 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
5 . (iv) ANTISENSE: No 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
(ix) FEATURE 

(A) NAME: PROT(J8-K4)S 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 : 
10 GGGAATTCCA TATGGCTCCC ATTACTGCT ACAC 24 


(12) INFORMATION ON SEQUENCE SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 42 nucleotides 

(B) TYPE: nucleic acid 

15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Synthetic DNA 
(iv) ANTISENSE: Yes 

(vii) IMMEDIATE SOURCE: oligonucleotide synthesiser 
20 (ix) FEATURE 

(A) NAME: PROT(J8-K4)S 
(xi) SEQUENCE DESCRIPTION SEQ ID NO: 12: 
TTTGAATTCC TACTTCTTCT TCTTGCTAGC CCGTGTGGCG AC 4 2 
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CLAIMS 

1. Isolated and purified polypeptides containing the 
HCV-NS3 serine protease domain sequence, .characterised in 
that they have at their C- terminal end a tail of at least 
three lysines. 

5 2. Expression vectors for the production of the 

polypeptides according to claim 1, characterised in that 
they comprise: a polynucleotide coding for one of said 
polypeptides; regulation and translation sequences 
functional in said host cell, operationally bonded to 
10 said polynucleotide; and, optionally, a selectable 
marker . 

3. A prokaryotic cell, characterised in that it is 
transformed with an expression vector containing a DNA 
sequence coding for the polypeptides according to claim 
15 1, so as to allow said host cell to express the specific 
polypeptide which is coded in the chosen sequence . 

4 . A process for the preparation of polypeptides 
containing the HCV NS3 serine protease domain sequence, 
characterised in that it comprises the following 
20 operations: 

- transformation of a prokaryotic host cell with an 
expression vector containing a DNA sequence coding for a 
polypeptide containing the HCV NS3 serine protease domain 
sequence ; 

25 - growth of the prokaryotic host cell on a special 

culture medium containing Zn^* or alternatively salts of 
transition metals such as Co, Cd, Mn, Cu, Ni, Ag, Fe, Cr, 
Hg, Au, Ft, V; 

- expression of the DNA sec[uence required to produce 
30 the chosen polypeptide; 

- purification of the polypeptide without having to 
resort to resolubilisation protocols, and without the 
need for renaturation of the protein from included 
bodies , 

^5 said procedure making it possible to obtain said 

polypeptides in their native, soluble form suitable to 
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enable determination of the three-dimensional structure 
of the enzyme by means of NMR or X-ray crystallography 
techniques . 

5. A process for the renaturation in vitro of 
polypeptides containing the HCV NS3 serine protease 
domain sequence, characterised in that it comprises the 
following operations: 

- transformation of a prokaryotic host cell using an 
expression vector containing a DNA sequence coding for a 
polypeptide that contains the HCV NS3 serine protease 

domain sequence; 

- expression of the DNA sequence required to produce 

the chosen polypeptide; 

- purification of the denaturated polypeptide and 
renaturation of the protein using buffers containing Zn^"^ 
or alternatively salts of transition metals such as Co, 
Cd, Mn, Cu, Ni, Ag, Fe, Cr. Hg, Au, Pt , V, 

said procedure making it possible to obtain said 
polypeptides in their native, soluble form suitable to 
enable determination of the three-dimensional structure 
of the enzyme by means of NMR or X-ray crystallography 
techniques . 
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